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Abstract. We consider approximation problems for a special space of d variate 
functions. We show that the problems have small number of active variables, as it 
has been postulated in the past using concentration of measure arguments. We also 
show that, depending on the norm for measuring the error, the problems are strongly 
polynomially or quasi-polynomially tractable even in the model of computation where 
functional evaluations have the cost exponential in the number of active variables. 



I. Introduction 

This paper is inspired by [I], where an importance of a special class of multivariate 
functions was advocated, and by recent results on tractability of problems dealing with 
infinite- variate functions, see [TJ |2J EJ EJ EJ QUI EH EE5J ESI EZ], where the cost of an 
algorithm depends on the number of active variables that it uses. 

The selection of functions in [I] was based on a particular choice of the metric used 
in the space of the variables Xi of the functions and on the smoothness of the functions. 
Here we consider the case where the Xi denote features of some objects. Adding new 
features will increase the distance in general, and this increase can grow substantially 
with the dimension. For example, if Xi G [0, 1] for i = 1, . . . ,d then the average squared 
Euclidean distance of two points grows proportional to the dimension d: 



I I V(x i -y 4 ) 2 dxdy = 0(rf). 

J\0,l] d J\0,l]d ~1 



'[0,l] d </[0,l] 

This unbounded growth shows that Euclidean distance cannot approximate any dis- 
tance function between two objects for large d. This is why it was suggested in [3] to 
use a scaled Euclidean distance to characterize the dissimilarity of two objects based 
on features X\, . . . , Xd- 



dist(x, y) 



1 d 



i=i 



The continuity of functions considered in pE] was Lipschitz-continuity based on 
the scaled Euclidean distance. For differentiable functions, this leads to conditions 
of bounded 



i=i 



df x2 
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where L\ is the Lipschitz constant of / with respect to the scaled Euclidean distance. 
A model example is the mean function 



i=i 

This function has a Lipschitz constant L\ = 1. Consequently the gradient satisfies 

1 



d 1 / 2 ' 



It follows that / is approximated with an 0(d~ 1 ^ 2 ) error by the constant 0.5, i.e., the 
values of / are concentrated around 0.5. This concentration phenomenon for general 
Lipschitz-continuous functions was established by Levy in [8]. 

Higher order approximations can be derived in the case when higher order Lipschitz 
constants are finite, i.e., where for some m > one has 



Using the example of the mean function one has 

d 



2 




= 0(l/d). 

From this one gets the first order (additive function) approximation 
2 1 d 

— ) j XjXj = - ^(1 - Xi/d)xi - 1/4 + 0(l/d). 

i<j i=l 

A similar approximation is obtained for the average squared distance d ^_^ Ylii<j{ x i ~ 
Xj) 2 . Both these functions do satisfy a higher order Lipschitz condition with respect to 
the scaled norm introduced earlier. 

Classes of such functions and the particular scaling by l/d m , where m is equal to the 
number of involved variables, are related to the weighted reproducing kernel Hilbert 
space Tid of multivariate functions on [0, l] d with the reproducing kernel given by 

/C(x,y) = 1 ■]Jm\n[.r J . !/i ). 

u^0 jett 

Here the sum is over all subsets u of {1, . . . , d}. This is why we consider such spaces in 
the current paper. It is well known, see, e.g., [7], that functions from that space have 
an ANOVA-like representation of the form 

/(x) = / + ]T/ u (x), 

where each component f u depends on, exactly, the variables listed in u. Hence, u is 
the list of active variables in /„ and the scaling parameter m is equal to |u|. The 
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corresponding norm of / is given by 

\\nk = \M 2 +E dlu] - 

As already mentioned, it was also postulated in [I] that functions of this form are 
well approximated by sums of those components /„ that depend on small numbers of 
variables, i.e., with u of small cardinality, or just by a constant function. We show in a 
more quantified way, that this is true for approximation problems with errors measured 
in a norm of another Hilbert space that also has a tensor product form. That is, 
we show that to approximate / with an error not exceeding e ■ ||/||ft d , it is enough to 
consider only those terms f u that depend on at most |u| < m(e, d) variables, where 
m(e,d) grows with 1/e very slowly and/or decreases to zero when d tends to infinity. 
More precisely, for general tensor product spaces (including the L 2 space), we have 

t -a ■ f, c -H 1 / £ )\ 

m{e, d) < mm [d , . 

for a known constant c > that does not depend on e and d. For instance, for any 
d e N + and the error demand e = I0~ q , we have 

m(l(r 2 ,d)<5, m(l0~ 4 ,d)<8, and m (l0~ 8 , d) < 14. 

Suppose next that spaces Hd and Gd satisfy the following assumption: there exists 
C < oo such that 

(1) ll/l&<tf-£ll/»llft, f ^ all / = ft* 

u u 

Then m(e, d) has even a smaller upper bound 

mM) - mi H '^nw^rJ- 

Hence for a fixed error demand e, m(e, d) = O (1/ ln(d)) as d — > oo. 
Actually, we prove these results for reproducing kernels of the form 

u^0 jeu 

for a general class of univariate kernels K : D x D — > R including of course K(x, y) = 
min(x,y) and D = [0, 1]. 

We also study the tractability of approximation problems for algorithms that can 
use arbitrary linear functional evaluations. However, as it has been done in the recent 
study of infinite-variate problems, we assume that the cost of each such evaluation 
depends on the number k of active variables and is given by $(k). Under the general 
tensor product assumption, approximation is quasi-polynomially tractable, and it is 
strongly polynomially tractable if JT]) is satisfied. These results hold even when the cost 
function $ is exponential. We also find a sharp upper bound on the exponent of strong 
tractability. Approximation is weakly tractable even when $ is doubly exponential. 



d |u| /u 

Yl je u dx j 
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2. Basic Definitions 

2.1. Space of d-Variate Functions. Let D C 1 be a Borel measurable set and let 
H = H(K) be a reproducing kernel Hilbert space (RKH space for short) of functions 
/ : D — y R whose kernel is denoted by K. 
We assume that 

where 1 denotes the constant function f(x) = l for all x. 

In what follows we write [l..d] to denote the set of positive integers not exceeding d, 

[l..d] := {fi6N + : n < d} 

and use u, to denote subsets of [l..d]. Consider now the weights 

(2) ld,u-=d-\*\ for uC[1.4 

Clearly 7^0 = 1. 

The weighted space of <i-variate functions / : D d — > R under the consideration is the 
RKH space Hd whose kernel is given by 

/Q(x, y) := ^ 7d,u • K u (x, y) and K u (x, y) = JJ K{x j ,y j ) 

uC[l..d] j£u 

with the convention that K$ = 1. 

For each u, by H u we denote the RKH space whose kernel is equal to K u . Clearly 
H$ = span{l} and H u ~ if®l u l for u 7^ 0. It is well known that the spaces H u , as 
subspaces of Ha, are mutually orthogonal and any / G Wd has the unique representation 

/(x) = /"( x ) with A G 

uC[l..d] 

and 

uC[l..d] uC[l..d| 

This representation is similar to the AN OVA decomposition since each term f u depends 
only on the variables listed in u. The space considered in [1] and mentioned in the 
Introduction is related to space %d with the classical Wiener kernel discussed in the 
following example. 

Example. Consider 

D — [0, 1] and K(x, y) = min(x, y). 

Then H is the space of functions / : [0, 1] — > R that vanish at zero, are absolutely 
continuous, and have /' G ^([0, 1]). The norm in H is given by 

11/111= /V(*)| 2 ds- 

For u 7^ 0, H u consists of functions that depend only on the variables Xj with j G u, 
are zero if at least one of those variables is zero, have the mixed first-order partial 
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derivatives bounded in the L 2 norm, and 



d = \m\ 2 + J2 dlu 

where [x; u] is given by 



D" 



2 



j'eu 3 



dx for / El-L a 



[x;u] = [yi,...,yd] with y 3 ; := j 



Xj if j E u, 
otherwise. 



2.2. Function Approximation Problems. For every d > 1, let be a separable 
Hilbert space of functions on D d such that Hd is continuously embedded in it. We 
denote the corresponding embedding operator by Sd, i.e., 

S d :U d ^ Q d and S d (f) = f. 

We assume that Sd and Qd have tensor product forms, i.e., for every u and every 
/( x ) = Ujenfj( x j) with fj e H > we have 

(3) n/iifc=ni^ii*- 

For simplicity of presentation we also assume that 

\\l\\g 1 = 1 so that \\l\\g d = 1- 
The continuity of Sd is equivalent to continuity of Si. Indeed, let 

(4) C := sup \\f\\ gi < oo. 

II/I|h<i 

Then for every u we have 

sup u/ii 0< = cf 



II/IIh»<i 



and 



since 



Clearly 



k ri2-k _ I y _|_ o 



d 



uQL.dl fc=0 ^ ' 



1<(J2 ^'-ii/uik) < E ^ • ct |u| ■ ii/ii^. 

^uC[l..d] ' uC[l..d] 



1 < ||<S<z|| < e c ° /2 for every d 
which means that the corresponding approximation problem is properly scaled. 
Note also that the condition ([Q) holds if 

(5) (l,/) Ol = for all f E H. 

Actually, under (EJ we have 



2 _ 

uCfl 



E Il/ull5< for all / E Hd- 
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Then we can get a better estimate of the norm of S d ~- 



I < E rf|u| • ii/»ii!r„ ■ c o M ■ rfHu| < \\f\\k ■ ■ d~ k 



k<d 



Since the estimation above is sharp, we conclude that 

\\S d \\ =maxC n fe -rf- fe/2 

k<d 

The class of such approximation problems contains the following vjeighted-L^ ap- 
proximation. 

2.2.1. Weighted L 2 Approximation. Let p be a given probability density function (p.d.f. 
for short) on D. Without loss of generality, suppose that p is positive (a.e.) on D. Then 
the L 2 (pd,D d ) space with finite 

ll/llL (Pd ^)= / |/(x)| 2 - Pd (x)dx, 

J D d 

is a well defined Hilbert space. Here by p d we mean 

d 

We then take 

g d = L 2 ( Pd ,D d ). 

It is well known that the continuity of Si is equivalent to the continuity of the following 
integral operator 

W 1 :=S* 1 oS 1 :H^H, Wx(f)(x) = I f(y) ■ K(x,y) ■ p(y) dy, 

Jd 

since then ||5i|| 2 is equal to the largest eigenvalue of Wi, i.e., 

Cl = max {A : A G spect(Wi)} . 

Then 



i<\\s d \\ 2 <[i + ^ 



2\ d 



The condition fl5J) is now equivalent to 

/ f(x) ■ p(x) dx = for all / G H, 
Jd 

which is satisfied by various spaces of periodic functions. 
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2.3. Algorithms, Errors and Cost. Since problems considered in this paper are 
defined over Hilbert spaces, we can restrict the attention to linear algorithms only, see 
e.g., [14], of the form 



3=1 

where Lj are continuous linear functionals and cij G Qd- In the worst case setting 
considered in this paper, the error of an algorithm A n is defined by 

(A v r ^ \\f-Mf)\\e d 

error (X; n d , Qd) ■= sup — <L . 

feH d WtWa 

So far, in the complexity study of problems with finitely many variables, it has 
been assumed that the cost of an algorithm is given by the number n of functional 
evaluations. We believe that, similar to problems with infinitely many variables, the 
cost of computing L(f) should depend on the number of active variables of L. More 
precisely, for given L G %% let Hl G %d be its generator, i.e., 

L(f) = (f,h L ) Hd for all feU d . 

Then h L = £«c[ M K, 



Act(L) :-- 



U{° : K^o, h L = h ») 

^ uC[l..c21 ' 



l..d] 

is the number of active variables in L, and the cost of evaluating L(f) is equal to 

$(Act(L)), 

where $ : N + — > M + is a given cost function. The only assumptions that we make at 
this point are 

$(0) > 1 and $(fc) < $(fc + 1) for all k G N. 

This includes 

$(fc) = {k + l) q , $(k) = e q - k , and ${k) = e 6 '" 
for some q > 0. Then the (information) cost of A n = Yjj=i ^j(f) ' a j ^ s gi ven by 

n 

cost(A) := ^$(Act(L j )). 

3=1 

The tractability results obtained so far for functions with finite numbers of variables 
correspond to $ = 1. In our opinion, it makes sense to assume that the cost function is 
at least linear, i.e., 

$(Jfe) > c- (Jfe + 1), kEN. 
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2.4. Information Complexity and Tractability. By (information) complexity we 
mean the minimal information cost among all algorithms with errors not exceeding a 
given error demand. That is, for e G (0, 1), 

comp(e; H d , Qd) ■= hif {cost (.4.) : error (.A; % d , Qd) < z} ■ 

We now recall the definition of three kinds of tractabilities. For a detailed discussion 
of tractability concepts and results, we refer to excellent monographs [TTjlT2"]. We stress 
however, that those results pertain to the constant cost function, $ = 1. 

We say that the problem Sd (or more precisely the sequence of problems S d ) is 
polynomially tractable if there exist c, p, q > such that 

d q 

comp(e; Hd, Qd) < c ■ — for all e G (0, 1) and d G N+. 

It is strongly polynomially tractable iff the above inequality holds with q — 0, and weakly 
tractable iff 

ln(comp(e;% d , Q d )) 
limsup — — = 0. 

d+l/e^oo d + 1/e 

When the problem is strongly polynomially tractable then 

p stT := inf < p : sup e p ■ comp(e; Hd, Qd) < oo > 

I e,d J 

is called the exponent of strong tractability. 

There is also a concept of quasi-polynomial tractability introduced recently, see [3]. 
It is weaker than polynomial tractability and stronger than weak tractability. More 
precisely, the problem is quasi-polynomially tractable if there exist c, t > such that 

comp(e; H dl Qd) < c • exp (t ■ (1 + ln(rf)) • (1 + ln(l/e))) for all e G (0, 1) and d G N+. 

This means that comp(e; T-Ldi Qd) < c ■ (e • d)*''- 1+ln '- 1 / e ^. The significance of the quasi- 
polynomial tractability is that for some applications, d can be very large but e need 
not be very small, say e = 10~ 2 . Then the complexity of the problem is bounded by a 
polynomial in d. 

As we shall prove in the next Sections, the problems considered in this paper are 
quasi-polynomially tractable even when the cost function $ is exponential in d. 

3. Results 

3.1. Number of Active Variables. We are interested in a number m = m(e, d) such 
that, for any / G Hd, the terms /„ with |u| > m can be neglected, i.e., 



(6) 


E A 


< e ■ 


E /. 






|u|>m(e,d) 




|u|>m(e,d) 





Hence, to approximate Sd(f) with error bounded by ev2, it is enough to use algorithms 
with functionals Lj that have Act(Lj) < m(e,d). 

We first find m(e, d) for the general tensor product space Qd and next for the special 
case fll]). To distinguish between the two cases, we will write respectively mi = mi(e, d) 
and W2 = m.2(e, d) instead of m = m(e, d). 
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3.1.1. General Case. For given e G (0, 1) and d G N+, define 



(7) 



m\ = m i{£, d) := mm < m : \ k ) \ ~d / 

^ k=m+l V / V / 



Of course, mi(e, d) is well defined and is bounded by d. 

Proposition 1. For every d, e G (0, 1), and f G Ha, ([6]) holds with m = mi(e, cf) gwen 
fry ([7j). Moreover, mi(e,d) is bounded from above by min(c?, M), where M = M(e) is 
the solution of 

(M + l)! e c o 



^,2- (M+l) P 2 
G 



In particular, there exists a constant C\ such that 

mi(e, d)<d- /or all e < e~ e . 

ln(ln(l/£j) 

Proof. Of course, fl6]) holds if mi(e,d) = d. Therefore we consider only the case when 
mi = rni(£, d) < d. We have 



|u|>mi 



|u|>mi 



|u|>mi 



< 



L |u|>mi 



1/2 



E /. , 

|u|>mi "-d |u|>mi 
d 



E 7d,u'Co 
|u|>mi 

E ^- Co 21 "' 1 V2 



2>| 



1/2 



< 



E /• 

|u|>mi 

E /. 

|u|>mi 



H d 



H„ 



E 

fc=mi+l 



1/2 



This completes the proof of the first part. We now estimate the number mi(e,d). 
Observe that, for any m < d, we have 

■y2\ k d 



E 



k=m+l 



d \ fC. 



k 



7) " ^ c ° 



k=m+l 



d---(d-k + l) 
d k -k\ 



< 



fcl 



2-(m+l) °° 



< 



fc! ~ (m + 1)! 

fc=m+l v ' ]=0 



{m + l+j)\ 



2- (m+l) _ e c 2 



rY 2-(m+l) oo „2-j / , 1 , • \ n L ' 

C o c o _/ ( m + l+j \ K Co 

(m+l)! ^ J' V i / ~ (m + l)! 



This completes the proof. 



□ 
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Remark 1. One can slightly improve the estimate of mi(e,d) by letting M = M(e) 
to be the minimal integer such that Cq/(M + 1) < 1 and 

1 



(M + 1)!/C 2 - (M+1) > 



Cg/(M + 1))- 



This is because the last sum in the proof above can be bounded as follows: 

j 1 



T, c -fi 

3=0 J ' 



m + 1 + j 
j 



3=0 



m + 1 



1 



r<2 



We calculated the values of |"M(e)] for e = 10~ q with q — 1, . 
approximation problem with the Wiener kernel on [0, 1] and p(x) 
1/2. These values are listed in the following table. 



• (m + 1) 

. , 10 for the function 
= 1. Recall that then 



r<2 
°0 



q 


l 


2 


3 


4 


5 


6 


7 


8 


9 


10 




3 


5 


7 


8 


10 


11 


13 


14 


15 


17 



3.1.2. Special Case ([T]). We now investigate the number of active variables under the 
assumption ([T]). Then, for any k < d, 

2 



|u|>fc 



< c- 



u II/* 

[u|>fc 



< 



|u|>fc 



2-|u| 




7 dl u -7d,u- ||/u 



|2 

I flu 



< C-max(C 2 ' 



|u|>fc 



(8) 



m 2 



Therefore, for m 2 = m2(s, d) given by 

if d < Cl and (C 2 /d) d < e 2 /C, 

d ifd< Cl and \cl/d) d > e 2 /C, 

min (k : (Cl/d) k+1 < s 2 /C) otherwise, 

we have the following proposition. 

Proposition 2. Suppose that (JTJ) satisfied. For every d, e G (0, 1), and / 6 
/ioWs with m = m 2 (e, d) given by ([8]). Moreover, for d > Cl, 

~\n(C/e 2 ) 



m2(e, d) < min I <i 



Hd/ci) 



- i 



J and m 2 (e, d) = O (in 1 (<i)) as <i — >■ oo. 



3.2. Changing Dimension Algorithm. We consider in this section very special al- 
gorithms that are from the family of changing dimension algorithms introduced in 
[6] for integration and in [161 E] f° r approximation of functions with infinitely many 
variables. As shown recently in [15], these algorithms yield polynomial tractability for 
weighted L 2 approximation problems with infinitely many variables and general weights 
that have the decay greater than one. 

These results are not applicable in this paper since the weights 7d, u = cH u ' have 
decay exactly one. However, these weights still allow for quasi-polynomial tractability 
and strong polynomial tractability if ([!)) holds. 
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More precisely, let {(Ai !n , Ci,n)K£Li be the set of eigenpairs of the operator 

Wi := SI o S! : H ->• H 

for the class H of univariate functions. We assume that Ai )W are monotonically decreas- 
ing to zero with a polynomial speed, i.e., that 

(9) a:=decay({A 1)n }~ =1 ) > 0. 

Recall that the decay of a sequence of positive numbers a n is defined by 

decay ({a n }^ =1 ) : = sup < t : a]/ 1 < oo > . 

I n=l J 

For instance, the decay of a n = • ln a (n)) is equal to (3. We also assume that 
Ci,n's form a complete orthonormal system in H. It is well known that the constant Co 
is equal to the square- root of the largest eigenvalue of Wi, i.e., 

Co = a/ A~i~l. 

3.2.1. General Case. Consider the operator 

W u = S* u oS u :H u ^H u 

for the space H u . Due to the tensor product structure of S u and H u , the eigenpairs of 
W u are provided by the products of the eigenpairs for the univariate case. Let {X Utn }^Li 
be the set of all the eigenvalues of W u listed in the decreasing order, A u n > A U) „ + i. We 
now use a standard technique to estimate these eigenvalues. For that purpose note 
that, for any 

t > 1/a, 

we have 

oo oo 

J2 X ln=[L(r)] lul with L(r):=^A[ in <oo. 

n=l n=l 

Therefore the nth largest eigenvalue A un satisfies 



n-\: n <[L(r)p, i.e., A u ,„ < 



n l/r 



Let Cu,n be the normalized eigenfunction corresponding to the eigenvalues A U) „. It is 
well known, see, e.g., [H], that the algorithm 

■n 
3=1 

have the minimal errors among all algorithms using n functional evaluations and 



error(A* n ; H u , Q u ) = a/ A u , n+ i < 



[L(r)]H ^ 1/(2 - r) 
n + 1 



Since H u are orthogonal subspaces of Hd, the algorithms A* >n are naturally extend- 
able to %d and 



^u,n( ^ ^ D ) ~~ ^u,n(/u) 



t>C[l..<2] 
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Moreover, 

coBt(^ B )<n.$(|u|). 
We are ready to define the algorithms A £ ^ for the weighted space l-i d . For e e (0, 1), 

let 

(10) AM) -■= (f, i) H , + E A lnSfl 

l<|u|<mi(e,d) 

where 

(11) n u = n Ut£ :-- 
with 



[L{r)\ 



l"l 



£ . d |u|/(2(l+T)) 

and e u = e M := -j= 

mi(e,d) 



R = R(e,d):= E U ) " 
k=l ^ ' 

Since n u depends on u only via |u|, we will sometimes write ri| u | or ri£ if |u| = £ 
instead of n u . 
Note that 

E ^ — - E V^^^'W' 

fc=l k=l K ' 

where 

T =t(e,d) :=mm( mi (e,d), [d 1/{1+T) \) . 

This follows from the fact that the sequence d k ^ 1+T ^ /k\ increases until k < d l ^ 1+T \ 
and next starts to decrease, as can be easily verified. Hence 

( d m ^ £ ^/{{ mi {e, d) - 1)!) 1+T for d > (mi(e, d)) 1+T , 
R 1+T < I 

[ mi(e, d) ■ e mi( - £ ' d ^ otherwise, 

where in the second case we replaced (£*)\ by (t/e) e * and used the fact that t < 
mi(e,d). This means that 

(R(e, rf)) 1+r < Ci - y?^,, • e -ci/m(in(i/*)) if mi(£> d) > d l/(l+r) 
m(ln(l/£j) 

and 

(i2(£, rf)) 1+T < e Cl/[(l+r)-ln(ln(l/ e) )] . rf d-lnd/e)/ ln(ln(l/ £ )) j f ^ ^ < ± 

Of course, in all the above estimates, we assume that e < e~ e . 

We now estimate the error of the algorithm A £ ,d- Since A e ,d(f u ) = for all / u with 
|u| > m 1 (e,d), we have 

[error(A,d; U d , Qd)f = E 7d,« " [error (A;„ u ; tf u , &)] 2 + £ 7 d,u • C 2 ' |u| 

l<|u|<mi(e,</) |u|>mi(e,(i) 

The latter sum satisfies 



E = E (0 (f)' 



<£ 2 , 



|u|>mi(£,d) k=mi(e,d)+l 
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whereas the former sum is bounded by 



7d,u • 4 = Yl ( i ) ■ d ~ e ~ 6 

l<|u|<mi(e,d) i=l ^ ' 



e 2 ■ i?- 1 ■ { i)- d ~ e - //(1+T) 



e 2 . 



This means that 

error(A,d; <S d , H d ) < e ■ V2. 
We now estimate the cost of ^4 e ,d: 

cost(A,d) < $(0) + J2 $(H)-nu<$(0) + $(mi(e,d)) n " 

l<|u[<mi(e,(f) l<[u|<TOi(e,(f) 



and 

d \ [L(r)Y 



E «» = E (")-»,<^-ff - E (J) 

<mi(e,<f) £=1 V ' e=l ^ ' 



:-t/(1+t) 



l<|u|<mi(e,d) 



< max(L(r),[L(r)] mi ^) 



e 2 - 



We summarize this in the following theorem. 

Theorem 1. Suppose that holds. The approximation problem is quasi-polynomially 
tractable even if $ is an exponential function of d, %{d) = 0(e q ' d ), and is weakly 

tractable even if = O (V") for some q > 0. Moreover, for any r > 1/a, the 

algorithms A E ,d have errors bounded by e ■ y2 and cost bounded by 

cost(A.d) < $(0) + $K(£, d)) ■ max (L(t), [L(r)] m ^) ■ ^ ( ' ' /)! '' ' ' 



e 2 - 



where mi(e,d) is given by (J7J), e.g., 



mtie, d)<d- fore<e e , 

ln(ln(l/e)) 

and 

^I^^-^ Cl/ln(ln(1/£)) if m 1 (e,d)>d^\ 



[R(s,d)] l ^< 



ln(ln(l/ £ )) 

£ Ci/[(l+r)an(ln(l/ £ ))] . rf Ci-ln(l/e)/ In(ln(l/e)) otherwise. 



We believe that the result on quasi-polynomial tractability is sharp in general, i.e., 
there exist H and G\ such that the corresponding multivariate problem with weights 
7d,u = d~^ is only quasi-polynomially tractable. However, as we prove in the next 
section, Theorem [1] is not sharp when ([Q) holds. 
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3.2.2. Special Case (JTJ). We begin this section by assuming for a moment that 



(12) 



^2 fu = ^2 WMod forever y feu*, 



uC[l..d] Gd U C[l..d] 

which is a stronger assumption than ([1]). Similar spaces with norms satisfying ffl2|) have 
been considered in [161 ETZ] for functions with infinitely many variables (d = oo) and 
some of the results below follow from [16J. 

As shown in [16J, ( fl2l) allows for a simple characterization of the spectrum of 



w d = s d os d :n d ^n d 

in terms of the spectrum of W\. Indeed, the eigenvalues of W d are given by 

7u • n Ai >^ 

j'eu 

for all u and kj e N. For u = 0, 1 is the corresponding eigenvalue. 
Let Xd <n (n G N+) be the eigenvalues of Wd ordered so that 

Ad,n > Xd,n+i for all n. 

Let r]d,n be the corresponding eigenfunctions that form a complete orthonormal system 
in % d - They also have a tensor product form and i]d, n corresponding to the eigenvalue 
Tw lljeu Al . fc j nas an ^ ne active variables listed in u. 
Define 

n(e,d) 

-KM') '■= ^2 (f'V^Hd ' Vdj with n(e,d) := min {k : A djfc+ i < e 2 } . 
i=i 

It follows from [16] that A* £ d is optimal for any cost function $, i.e., error (A* d ] Hd, Hd) < 
e and 



cost (.A* d ) = min {cost (A) : error (^4; £? d ) < e} = comp(e; £?, 



d ■ 



Now A M = Cg, 

ln(l/e 2 ) 



m2(e, c?) = min I d 



- 1 



ln(dMi.i) 

and the functional evaluations (f,f]d,j)- Hd used by the algorithm have at most m2(e, d) 
active variables. 

Note that, for every 5 > 0, we can bound rri2{£, d) by 

(13) m 2 {e, d) < max (A X| i • e 1/<5 , S ■ ln(l/e 2 )) . 

Indeed, (TT3|) trivially holds if d < A^i • e 1//<5 , and 

ln(l/e) „ ln(l/e) _ s ,^ ^ it J ^ ^ 



ln(d/Ai,i) ln^ 1 /*) 
Take now 

t > 1/a, 



< v 7 = (5-ln(l/e 2 ) if rf>Ai,i-e 1/d . 
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where, as before, a = decay({Ai in }^ =1 ) > 0. Using a standard technique, we get 

oo |u 

K, k -k < J2 x h = E ^ E II A U 

j = l uQ[l..d] k6N |u| 1=1 

J 



E 



d 



Mr] 
d7 



1 



Mr] 
d T 



< e L ^ dl 



< oo. 



Hence 



A d ,fc < e L(T) - dl " T/r • k~ l/r and n{e, d) < 



e L(T).*-r . £ -2.r 



1. 



Note that the term e L ^' dl T is bounded from above by e L( -' r ^ if r > 1, and converges to 
1 with increasing d if r > 1. 

We return now to the original assumption ([T]). By replacing e by e/\/C in all the 
formulas above, we get that ^* E ,^Q d has the error bounded by e when the norm in Qd 
satisfies 



E f» 

uC[l..d] 



C 



for all / e % a 



E /• 

uC[l..d] 

Moreover, all upper bounds on the cost and errors provide corresponding upper bounds 
for norms that satisfy only ([T]), i.e., when the above equality is replaced by inequality. 
This yields the following theorem. 

Theorem 2. Suppose that ([1]) and (Q hold. Then for any t > 1/a, 

N g L(r).d 1 - T 

comp^;^,^) < $ (m 2 (e/VC } d) 



(e/VC) 



,2-T 



with 



m 2 (e/VC, d) < min ^d , 



ln(C7/£ 2 



hi(d/Ai,0 

Z>ue to (1131) . £/ie approximation problem is strongly polynomially tractable with the 
exponent 

p stT < 2-max(l,l/a). 
even if$(d) = O (e q ' d ), and is weakly tractable even if$(d) = O ( e e j for some q > 0. 



We now show that the upper bound on p str is sharp if ffT2"]) holds. 
Proposition 3. // (EED and (ED hold then 



p stI = 2-max(l,l/a). 

Proof. Even for d — 1, we have conipfe; H, Qi) = Q (e~ 2 ' a ). Hence we only need to 
consider the case a > 1. Suppose by the contrary that p str = p for p < 2. The complexity 
of the problem with any cost function $ satisfying our assumptions is bounded from 
below by the complexity when $(d) = 1 for all d, and the latter complexity is fully 
determined by the eigenvalues of Wd- That is, we have 

comp(e; U d , Qd) > $(0) ■ min {k : X d ,k+i < £ 2 } 
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for any cost function $. Take any p G (p, 2). Then there is c(p) such that 

Xd,k < c(p) ■ k~ 2/p for all £ < 1 and d > 1. 
However, then, for any q > p, 

oo oo 

(14) X Tk ^ ( c (P)) q/2 ■ k ~ q ' 3 <oc for all d > 1. 

k=l k=l 

Take q G (p, 2). As already explained, 

2^ A **-l 1 + dq /2 J . 

fc=l v 7 

which converges to infinity as d — > oo. This contradicts (j!4p and completes the proof. 

□ 

We apply Theorem [2] to the following L2 approximation problem. 

Example. Consider K(x,y) = min(x,y), D = [0, 1], and p = 1. It is well known that 
for the corresponding L 2 approximation problem, we have a = 2. Hence, for at 
most exponential in d, we have strong tractability with the exponent 

p stv < 2. 
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